svaseq: removing batch effects and other unwanted noise from sequencing data
نویسنده
چکیده
It is now known that unwanted noise and unmodeled artifacts such as batch effects can dramatically reduce the accuracy of statistical inference in genomic experiments. These sources of noise must be modeled and removed to accurately measure biological variability and to obtain correct statistical inference when performing high-throughput genomic analysis. We introduced surrogate variable analysis (sva) for estimating these artifacts by (i) identifying the part of the genomic data only affected by artifacts and (ii) estimating the artifacts with principal components or singular vectors of the subset of the data matrix. The resulting estimates of artifacts can be used in subsequent analyses as adjustment factors to correct analyses. Here I describe a version of the sva approach specifically created for count data or FPKMs from sequencing experiments based on appropriate data transformation. I also describe the addition of supervised sva (ssva) for using control probes to identify the part of the genomic data only affected by artifacts. I present a comparison between these versions of sva and other methods for batch effect estimation on simulated data, real count-based data and FPKM-based data. These updates are available through the sva Bioconductor package and I have made fully reproducible analysis using these methods available from: https://github.com/jtleek/svaseq.
منابع مشابه
A Comparative Study on Performance of Two Aerobic Sequencing Batch Reactors with Flocculated and Granulated Sludge Treating an Industrial Estate Wastewater: Process Analysis and Modeling
In this study, the performance of two aerobic sequencing batch reactors (SBR) in removing carbon and nutrient (N & P) from Faraman’s industrial estate wastewater (FIW) with flocculated and granulated sludge were compared. The comparison study was performed by varying two significant independent variables (aeration time and mixed liquor volatile suspended solids (MLVSS)). The experiments were co...
متن کاملThe sva package for removing batch effects and other unwanted variation in high-throughput experiments
Heterogeneity and latent variables are now widely recognized as major sources of bias and variability in high-throughput experiments. The most well-known source of latent variation in genomic experiments are batch effects-when samples are processed on different days, in different groups or by different people. However, there are also a large number of other variables that may have a major impac...
متن کاملAn Enhanced Median Filter for Removing Noise from MR Images
In this paper, a novel decision based median (DBM) filter for enhancing MR images has been proposed. The method is based on eliminating impulse noise from MR images. A median-based method to remove impulse noise from digital MR images has been developed. Each pixel is leveled from black to white like gray-level. The method is adjusted in order to decide whether the median operation can be appli...
متن کاملRemoving Unwanted Variation from High Dimensional Data with Negative Controls
High dimensional data suffer from unwanted variation, such as the batch effects common in microarray data. Unwanted variation complicates the analysis of high dimensional data, leading to high rates of false discoveries, high rates of missed discoveries, or both. In many cases the factors causing the unwanted variation are unknown and must be inferred from the data. In such cases, negative cont...
متن کاملOn the widespread and critical impact of systematic bias and batch effects in single-cell RNA-Seq data
Single-cell RNA-Sequencing (scRNA-Seq) has become the most widely used high-throughput method for transcription profiling of individual cells. Systematic errors, including batch effects, have been widely reported as a major challenge in high-throughput technologies. Surprisingly, these issues have received minimal attention in published studies based on scRNA-Seq technology. We examined data fr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 42 شماره
صفحات -
تاریخ انتشار 2014